Possibilistic conditional independence: A similarity-based measure and its application to causal network learning
نویسندگان
چکیده
A definition for similarity between possibility distributions is introduced and discussed as a basis for detecting dependence between variables by measuring the similarity degree of their respective distributions. This definition is used to detect conditional independence relations in possibility distributions derived from data. This is the basis for a new hybrid algorithm for recovering possibilistic causal networks. The algorithm POSSCAUSE is presented and its applications discussed and compared with analogous developments in possibilistic and probabilistic causal networks learning. © 1998 Elsevier Science Inc. 1. Learning causal networks: The possibilistic case As more and more databases are used as a source for Knowledge Discovery [38], the interest o f au tomat ing the construct ion o f a well defined and useful knowledge representat ion as belief networks [35,34,33], becomes apparent . Several methods have been devised to recover bo th the structure and the probability distributions corresponding to it. Such methods can be roughly divided into quality o f implicit distribution methods [6,23,22], conditional independencebased methods [40,37,48] and hybrid methods [47,46]. The first ones construct Corresponding author. E-mail: [email protected]. 0888-613X/98/$19.00 © 1998 Elsevier Science Inc. All rights reserved. P H S 0 8 8 8 6 1 3 X ( 9 8 ) 0 0 0 1 2-7 146 R. Sangfiesa et al. / Internat. J. Approx. Reason. 18 (1998) 145-167 tentative belief networks by using measure of the quality of the distribution implied by the DAG being built. Current approaches use as a quality measure a posteriori probability of the network given the database [6], entropy of the distribution of the final DAG [5] and Minimum Description Length of the network [29] which is related to information criteria [2]. The second family of methods, uses tests for conditional independence between variables to recover a tentative dependency model of the domain and from this and independence properties a possible DAG structure is selected. Methods of this class, differ in the type of structure they are able to construct: polytrees [34], simple DAGs [25] or general DAGs [45]. Finally, hybrid methods, combine the first and second kind of methods in order to recover a network. For example, the CB algorithm uses dependence tests to recover a structure and uses a topological order on the resulting DAG to guide the K2 algorithm [47]. For a wider and more detailed discussion of current network learning methods see [41]. All these methods have been applied using a single uncertainty formalism, i.e., probability. However, uncertainty about a domain can be due to other factors beyond those for which probability is adequate. When imprecision or ambiguity are inherent to the domain, possibility theory [11,20] is a good alternative. These circumstances (imprecision and ambiguity) do arise in many real-world situations. For example, data may come from multiple sensors with unknown fault probability [27]. Some tasks, too, may have some degree of ambiguity as it is the case in diagnosis when there is added uncertainty about symptoms being related to more than one fault in a non-exclusive way [12]. The idea that belief networks can use uncertainty formalisms other than probability is, thus, a natural development. Several alternative formalizations exist: valuation-based systems [44,3]; possibilistic networks [15,14,13], probability intervals [7]. Due to the peculiar characteristics of such formalisms, new learning methods have been devised. In the context of possibilistic networks some interesting work has been done by Gebhardt and Kruse [21] in creating a learning method for possibilistic networks along lines similar to previous work in Bayesian learning [6]. Our aim in this paper has been to develop a method for building possibilistic networks that reflects in a consistent way all the dependence relations present in a database but also that recovers the most precise distribution from a database of imprecise cases which is a problem that we encountered in domains we are presently working in [41,42]. So, possibility theory was a natural choice. Problems, however arose in several shortcomings of current possibilistic counterparts of concepts such as independence, conditioning and measurement of possibilistic information. So, we have put forth new definitions and measures that have proven quite useful in our work. The organization of this paper is as follows. In Section 1 we review the basic concepts of extended belief networks, conditioning and independence in possibilistic settings; in Section 2 a new measure of possibilistic dependence R. SangiJesa et al. /lnternat. J. Approx. Reason. 18 (1998) 145-167 147 is discussed that combines similarity and information relevance concepts; Section 3 shows how this measure can be applied to a learning method; in Sect ion4 we comment on and present two new algorithms HeS and POSSe&USE. The first one is a hybrid variation on a previously existing algorithm due to Huete [25]. POSSCAUSE (Possibilistic Causation) is an extension to general DAGS. We comment in Section 5 about the results of applying them on a well-known test database. Section 6 is devoted to concluding remarks and future lines of research. 2. General belief networks and possibilistic causal networks Here we modify the notion of belief network which is usually identified with Bayesian networks. Definition 2.1. (General belief network). For a domain U = {xl . . .x,} the corresponding belief network is a directed acyclic graph (DAG) where nodes stand for variables and links for direct association between variables. Each link is quantified by the conditional uncertainty distribution relating the variables connected to it, ~. By uncertainty distribution we mean the distribution based on any confidence measure used to represent uncertainty about evidence. Belief networks have two interesting characteristics. Firstly, any given node xi in a belief network is conditionally independent of the rest of the variables in U, given its direct predecessors in the graph, i.e., its parents shieM the variable from the influence of the previous variables in the graph. Secondly, the joint uncertainty distribution induced by the DAG representing the dependencies in a given domain can be factorized into the conditional distribution of each variable with respect to its immediate predecessors (parents). That is ~(Xl ...Xn) = ~(x i lpa i ) , where pai is the set of direct parents for variable xi, ~ represents an uncertainty distribution (probability, possibility, etc.) and ® is a factorizing operator. In the case of probability this operator is the product of conditional distributions [33]; in the case of possibility it can be the product or the minimum operator [15]. Definition 2.2 (Possibilistic causal network). Possibilistic belief networks are belief networks where the underlying uncertainty distribution is the possibility distribution defined on corresponding to the graph. A belief network, then, represents the conditional independence relations that exist in a given domain. Now, conditional independence is a relationship between variables or groups of variables that has the following properties [36]: 148 R. Sangfiesa et al. / Internat. J. Approx. Reason. 18 (1998) 145-167 1. Trivial independence: I(X]ZIO ) 2. Symmetry: I(XIZIY ) ~ I(YIZIX) 3. Decomposition: I(XlZl Y U W) ~ I(XIZIY ) 4. Weak union: I(XIZI Y U W) ~ I(XIZ U YI W) 5. Contraction: I(glZl Y) A I(glZ u YI W) ~ 10(121Y u W) 6. Intersection: I(XIZ U WIY U W) A I(XIZ U Y] W U W) ~ I(XIZI Y U W) This characterization of conditional independence is as abstract as possible, thus, it makes no assumption about any particular uncertainty formalism used in order to recognize a given relationship as being an instance of a conditional independence relationship. Now, in learning from data, one has to define an operational criterion for identifying such relations from summarized information, as uncertainty distributions are. We will not review here the various techniques used in probability to detect such relations, the Z 2 test and its variations being the most classical ones. Our interest lies in defining a criterion for working with possibility distributions derived from data. It will allow us to infer, from the relations between two possibility distributions, whether the corresponding variables are independent or not. As it is the case in probability theory, such criterion rests on the previous notion of conditional distribution. Two (or more) variables will be considered as conditionally independent if their conditional distributions satisfy certain properties. But, while in probability there is a unique formulation for such conditional distributions, several different definitions have been proposed for possibilistic conditioning. We will just give them and then discuss several definitions for independence between variables. Dempster conditioning [8]: It is a specialization of Dempster's rule of conditioning for evidence theory. Given two variables X and Y taking values in {X1,X2,..., Xn} and {Yl,Y2,...,Y,}, respectively and the corresponding joint possibility 7~(x,y) distribution the conditional distribution g(xly ) is defined as ~(XIY) r~(g, Y) roy(Y) ' where ztr(Y) = maxyEr{rc(X, Y)}. Hisdal/Dubois conditioning [24,10]: In the same conditions as before J" rt(X, Y) if u(XIY) < re(Y), TC(XI Y ) 1 otherwise. See [39] for a discussion on the adequateness of these definitions. Now, independence between variables, as we remarked, will require some kind of comparison between their distributions (marginal and conditional), so one or the other of the above conditioning operators will be used in establishing independence. However, at a more abstract level, possibilistic independence between variables or groups of variables can be understood in terms of mutual information relevance. R. Sangfiesa et al. / Internat. J. Approx. Reason. 18 (1998) 145 167 149 Fonck [15] adheres to this view, putting forth the following interpretation: Conditional independence as mutual information irrelevance: Given three sets of variables X, Y, Z saying that X is independent of Y given Z amounts to the assertion: once the values of Z are known further information about Y is irrelevant to X and further information about X is irrelevant to Y. Given the sets X, Y, Z the independence relation I ( X I YI Z) is true iff rC~xl~uz } c and c c = ~{XlZ} 7r{rlXuZ} = 7z{~[z} is true, where rc c is the distribution that results from applying the c combination operator (i.e. a norm or the corresponding conorm). This definition is stricter than the one that had been taken previously as a test for independence in possibilistic settings: non-interactivity. Non-interactivity [49], means equality between marginal distributions and factored marginal distributions, analogously to the traditional property of factorization in probability theory. Fonck has proven that this definition does not satisfy all the independence axioms mentioned before but hers does [16]. Another similar line of work is followed by Huete [25] who explores three different views on independence. 1. Independence as no change in information: When the value of variable Z is known knowing variable Y does not change information about values of X. This can be understood as information about Y being irrelevant for X when Z is known. Note that this is a less strict definition than Fonck's, in the sense that only one such test is to be done (Fonck's symmetrical condition is not required here). 2. Independence as no information gain: When the value of variable Z is known, knowing variable Y brings no additional information about the values of X. In other words, conditioning represents no information gain. 3. Independence as similar information: When the value of variable Z is known, knowing variable Y brings a similar information about the values of X, this information being similar to the one that referred to X before knowing the value of Y. These three notions of independence are studied by Huete using Hisdal's and Dempster's conditioning operators. The interested reader is referred to [25]. 3. Measuring dependence through similarity between distributions The different interpretations of independence that we have commented above do not reflect completely separated concepts. In fact, they can complement each other. We have adopted an independence characterization based on similarity but that has some relation to information relevance. 150 R. Sang~esa et al. / Internat. J. Approx. Reason. 18 (1998) 145-167 Independence between X and Y can be related to the similarity between the marginal possibility distribution 7r(X) and the conditional distribution obtained after conditioning on Y, zt(X[ Y). Extending to the three-variable case I(XlZl Y) ~ rcc(xlyz) ~ 7tc(xlz)Vx,y,z, where 7rc is the distribution obtained by applying one of the usual conditioning operators for possibility distributions and ~ is read as "is similar to". Similarity between distributions admits several definitions. Let us suppose in the following that two distributions, ~t and zc' are being compared. These are the current similarity definitions used [25]: • 1so-ordering:
منابع مشابه
Kernel-based Conditional Independence Test and Application in Causal Discovery
Conditional independence testing is an important problem, especially in Bayesian network learning and causal discovery. Due to the curse of dimensionality the case of continuous variables is particularly challenging. We propose a Kernel-based Conditional Independence test (KCI-test), by constructing an appropriate test statistic and deriving its asymptotic distribution under the null hypothesis...
متن کاملLearning Bayesian Network Model Structure from Data
In this thesis I address the important problem of the determination of the structure of directed statistical models, with the widely used class of Bayesian network models as a concrete vehicle of my ideas. The structure of a Bayesian network represents a set of conditional independence relations that hold in the domain. Learning the structure of the Bayesian network model that represents a doma...
متن کاملCausal Networks Learning Acausal Networks Learning Influence Diagrams Learning Causal-Network Parameters Learning Causal-Network Structure Learning Hidden Variables Learning More General Causal Models Advances: Learning Causal Networks
Bayesian methods have been developed for learning Bayesian networks from data. Most of this work has concentrated on Bayesian networks interpreted as a representation of probabilistic conditional independence without considering causation. Other researchers have shown that having a causal interpretation can be important, because it allows us to predict the effects of interventions in a domain. ...
متن کاملProduct-based Causal Networks and Quantitative Possibilistic Bases
In possibility theory, there are two kinds of possibilistic causal networks depending if possibilistic conditioning is based on the minimum or on the product operator. Similarly there are also two kinds of possibilistic logic: standard (min-based) possibilistic logic and quantitative (product-based) possibilistic logic. Recently, several equivalent transformations between standard possibilistic...
متن کاملLearning Possibilistic Networks from Data
We introduce a method for inducing the structure of (causal) possibilistic networks from databases of sample cases. In comparison to the construction of Bayesian belief networks, the proposed framework has some advantages, namely the explicit consideration of imprecise (set-valued) data, and the realization of a controlled form of information compression in order to increase the eeciency of the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Int. J. Approx. Reasoning
دوره 18 شماره
صفحات -
تاریخ انتشار 1998